Exploration via Epistemic Value Estimation
نویسندگان
چکیده
How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of their own value predictions -- for instance compute bonus or upper confidence bound. Unfortunately required difficult estimate general with function approximation. We propose estimation (EVE): a recipe that compatible sequential decision making and neural network approximators. It equips agents tractable posterior over all parameters from which can be computed efficiently. use derive Q-Learning agent observe competitive performance on series benchmarks. Experiments confirm EVE facilitates efficient hard tasks.
منابع مشابه
Epistemic Value
The value turn has numerous motivations; I’ll discuss three of them. The first of these motivations consists in the ancient roots of the idea that epistemology is in some important way value-theoretic. Plato and Aristotle both took epistemic states such as knowledge and understanding to have particular and distinctive value. So too did their immediate and medieval followers (Zagzebski 2001). Th...
متن کاملGeneralization and Exploration via Randomized Value Functions
We propose randomized least-squares value iteration (RLSVI) – a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions. We explain why versions of least-squares value iteration that use Boltzmann or -greedy exploration can be highly inefficient, and we present computational results that demonstrate dramatic efficiency gains...
متن کاملDeep Exploration via Randomized Value Functions
We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning. We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through c...
متن کاملExploration via Model-based Interval Estimation
This paper takes an empirical approach to evaluating three model-based reinforcementlearning methods. All methods intend to speed the learning process by mixing exploitation of learned knowledge with exploration of possibly promising alternatives. We consider -greedy exploration, which is computationally cheap and popular, but unfocused in its exploration effort; R-Max exploration, a simplifica...
متن کاملPragmatic Encroachment and Epistemic Value
Does knowledge matter ? There are actually at least two questions behind this broad one. The first is whether the value of knowledge is independent from other epistemic values, such as the value of truth, or the value of having true beliefs. The second is whether knowledge, as an epistemic value is independent from other values, such as the good or freedom, which are practical or ethical values...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i8.26164